Building and Using a Richly Annotated Interlinear Diachronic Corpus: The Case of Old High German Tatian
نویسندگان
چکیده
The present paper reports on the development and evaluation of a historical corpus designed to support detailed empirical studies on the interaction of information structure and syntax in Old High German (OHG). The creation and exploration of this corpus are part of a more general investigation concerning the role of information-structural factors in the explanation of word order variation and change in the Germanic languages. The paper also describes corpus design principles, methodologies, relevant formats and specifications, and the technical infrastructure employed during the creation of the corpus, as well as its accessibility by means of the linguistic database of information structure ANNIS. RÉSUMÉ. Cet article rapporte le développement et l’évaluation d’un corpus historique conçu pour des recherches empiriques sur l’interaction entre la structure d’information et la syntaxe dans l’ancien haut-allemand. La création et l’exploration du corpus contribuent à l’investigation du rôle des conditions pragmatiques pour la typologie syntaxique, sa variation et sa mutation dans les langues germaniques. L’article décrit aussi les principes de design, les méthodologies, les formats et spécifications, et l’infrastructure technique utilisée pour créer le corpus. L’accès au corpus est obtenu par ANNIS, une base de données linguistique.
منابع مشابه
Challenges in Modelling a Richly Annotated Diachronic Corpus of German
This paper presents the design and architecture of a diachronic corpus of German. We describe the corpus architecture with a focus on the use and restrictions of XML as the data exchange and storage format. In our approach, a relational database will supplement the XML representation to support sophisticated search and presentation facilities. This is a report about ongoing work; the architectu...
متن کاملFrom Historic Books to Annotated XML: Building a Large Multilingual Diachronic Corpus
This paper introduces our approach towards annotating a large heritage corpus, which spans over 100 years of alpine literature. The corpus consists of over 16.000 articles from the yearbooks of the Swiss Alpine Club, 60% of which represent German texts, 38% French, 1% Italian and the remaining 1% Swiss German and Romansh. The present work describes the inherent difficulties in processing a mult...
متن کاملRhetorical Relations and Verb Placement in Early Germanic Languages Evidence from the Old High German Tatian Translation (9 century)
This is a first attempt at describing word order variation in the early Germanic languages in a dynamic model of discourse relations as outlined in the Segmented Discourse Relation Theory SDRT by Asher and Lascarides (2003). The study aims at investigating the interrelation between information structure and discourse organisation in the text of the Old High German Tatian translation (9 century)...
متن کاملDeutschDiachronDigital - A Diachronic Corpus of German
There are many digitized historical German texts from all periods (Old High German to Modern German). It is, however, difficult to carry out diachronic research because o there are differences in digitzation source (original or edition) o there are differences in digitization quality o the texts are stored in different (and, sometimes, incompatible) formats o many texts are not publicly availab...
متن کاملGearing the Discursive Practice to the Evolution of Discipline: Diachronic Corpus Analysis of Stance Markers in Research Articles’ Methodology Section
Despite widespread interest and research among applied linguists to explore metadiscourse use, very little is known of how metadiscourse resources have evolved over time in response to the historically developing practices of academic communities. Motivated by such an ambition, the current research drew on a corpus of 874315 words taken from three leading journals of applied linguistics in orde...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- TAL
دوره 50 شماره
صفحات -
تاریخ انتشار 2009